=============================================================================================
LinkCluE 
Version 1.0,  2010-July-08

This package is distributed under LGPL license.  (see GPL_License.txt)
Copyright (c) 2010 Iam-on & Garrett.
=============================================================================================

LinkCluE is a MATLAB package for Link-based Cluster Ensemble algorithms:
   - CTS: Connected-Triple based Similarity method
   - SRS: SimRank based Similarity method
   - ASRS: Approximate SimRank based Similarity method
as presented in 
   N. Iam-on, T. Boongoen, and S. Garrett. "Refining pairwise similarity matrix for 
   cluster ensemble problem with cluster relations". In Proceedings of Eleventh 
   International Conference on Discovery Science, October 2008, Budapest, Hungary, 
   pages 222-233.

How to use the package:
   (1) Copy all .m files to the current directory in your MATLAB environment or a directory in your MATLAB path. 
   (2) In MATLAB command window,
	To run an illustrative example of Link-based Cluster Ensemble algorithms, type (where 'CR' and 'V' are matrices 
	of clustering results and corresponding validity measures, respectively)
	     	[CR,V] = LinkCluETest
                   	To get the description of function, type
	     	help LinkCluETest

For questions and comments please contact:
Miss Natthakan Iam-on
nii07@aber.ac.uk
http://users.aber.ac.uk/nii07/




==============================================================================================
Functions in the package:
Note: to get description of each function, type "help" following by function name in MATLAB command window.
==============================================================================================

A. Format of Input Files:
----------------------------------
   - Data file: rows correspond to observations; columns correspond to variables (exclude class labels!!), 
	  see example in folder "SampleData\FGD.csv".
   - Truelabels file (optional, used when true cluster labels are known): n-by-1 vector of known cluster labels for all data points,
	  see example in folder  "SampleData\FGT.csv".


B. Illustrative Example:
--------------------------------
   - LinkCluETest.m: demonstrates how to set up input arguments and use Link-based Cluster Ensemble algorithms.
   - LinkCluE.m: presents the process of Link-based Cluster Ensembles.


C. The process of Link-based Cluster Ensemble includes:
------------------------------------------------------------------------------------
(1) Generating a cluster ensemble
	- crEnsemble.m: generates a cluster ensemble, using k-means algorithm, with two schemes (fixed k and random k) of selecting the number of clusters.
(2) Constructing similarity matrix
          There are three functions for creating link-based similarity matrix:
	- cts.m: constructs CTS matrix
	- srs.m: constructs SRS matrix (if the number of data points > 1000, it may take a long time to compute!!)
	- asrs.m: constructs ASRS matrix
(3) Performing consensus functions
          Similarity matrices generated by those functions discussed in (2), can be applied to any consensus function 
          that are similarity based, for instance, built-in Hierarchical Clustering functions in MATLAB.
 	- clHC.m: performs the final clustering using Hierarchical algorithms (Single-Linkage:SL, Complete-Linkage:CL and Average-Linkage:AL) as consensus methods.
(4) Evaluating clustering results
 	- cleval.m: computes validity scores for clustering results and displays a comparison bar chart.
   (4.1) Label-oriented validity criteria assess the degree of agreement between two data partitions, 
         where one of the partitions is obtained from a clustering algorithm and the other is taken from 
         a prior information, i.e. the known label of the data.
	- valid_CA.m: computes Classification Accuracy (CA), ranges [0, 1].
	- valid_RandIndex.m: computes Rand Index (RI), Adjusted Rand Index (AR), Mirkin's Index and Hubert's Index. The codes are 
	  from David Corney (D.Corney@cs.ucl.ac.uk), who holds the copyright.
   (4.2) Label-irrelevant validity criteria measure the goodness of a clustering solution using only quantities 
         and features inherited from the dataset. They are usually used when true cluster labels are unknown.
	- valid_compactness.m: computes the Compactness (CP) of clustering solutions.
	- valid_DbDunn.m: computes Davies-Bouldin Index (DB) and Dunn index. The codes are from Kaijun Wang 
	  (sunice9@yahoo.com), within Cluster Validity Analysis Platform (CVAP) package.
Note1: CP and DB ==> low values indicate good cluster structures
              Dunn, AR, RI and CA ==> large values indicate better cluster quality 
Note2: Other cluster validity criteria can be found in CVAP package (version 3.4) at:
http://www.mathworks.com/matlabcentral/fileexchange/loadAuthor.do?objectType=author&objectId=1095267
          

D. Auxiliary functions:
------------------------------
   - relabelCl.m: relabels all clusters in ensemble (auxiliary function for creating three link-based similarity matrices).
   - showBar.m: shows a bar chart for clustering validity comparison.
   - stod.m: converts similarity values to distance values and changes the format of matrix from square to vector (as an input format for a linkage function)
   - valid_sumsquares.m: auxiliary function for valid_DbDunn.m.
   - weightCl.m: computes weight for each pair of clusters using their shared members (auxiliary functions for creating 'CTS' and 'ASRS' similarity matrices).


